Statistical significance and extremal ensemble of gapped local hybrid alignment

نویسندگان

  • Yi-Kuo Yu
  • Ralf Bundschuh
  • Terence Hwa
چکیده

A “semi-probabilistic” alignment algorithm which combines ideas from Smith-Waterman and probabilistic alignment is proposed and studied in detail. It is predicted that the score statistics of this “hybrid” algorithm is of the universal Gumbel form, with the key Gumbel parameter λ taking on a fixed asymptotic value for a wide variety of scoring parameters. We have also characterized the “extremal ensemble”, i.e., the collection of sequence pairs exhibiting similarities that a given scoring system is most sensitive to. Based on this extremal ensemble, a simple recipe for the computation of the “relative entropy”, and from it the correction to λ due to finite sequence length is also given. This allows us to assign p-values to the alignment results for arbitrary scoring parameters and gap costs. The predictions compare well with direct numerical simulations for a broad range of sequence lengths with various choices of the substitution scores and affine gap parameters.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rapid Assessment of Extremal Statistics for Gapped Local Alignment

The statistical significance of gapped local alignments is characterized by analyzing the extremal statistics of the scores obtained from the alignment of random amino acid sequences. By identifying a complete set of linked clusters, "islands," we devise a method which accurately predicts the extremal score statistics by using only one to a few pairwise alignments. The success of our method rel...

متن کامل

Statistical Significance of Probabilistic Sequence Alignment and Related Local Hidden Markov Models

The score statistics of probabilistic gapped local alignment of random sequences is investigated both analytically and numerically. The full probabilistic algorithm (e.g., the "local" version of maximum-likelihood or hidden Markov model method) is found to have anomalous statistics. A modified "semi-probabilistic" alignment consisting of a hybrid of Smith-Waterman and probabilistic alignment is...

متن کامل

Score distributions of gapped multiple sequence alignments down to the low-probability tail.

Assessing the significance of alignment scores of optimally aligned DNA or amino acid sequences can be achieved via the knowledge of the score distribution of random sequences. But this requires obtaining the distribution in the biologically relevant high-scoring region, where the probabilities are exponentially small. For gapless local alignments of infinitely long sequences this distribution ...

متن کامل

Enhancing Parallelism of Pairwise Statistical Significance Estimation for Local Sequence Alignment

Pairwise statistical significance (PSS) has been found to be able to accurately identify related sequences (homology detection), which is a fundamental step in numerous applications relating to sequence analysis. Although more accurate than database statistical significance, it is both computationally intensive and data intensive to construct the empirical score distribution during the estimati...

متن کامل

Random differential inequalities and comparison principles for nonlinear hybrid random differential equations

 In this paper, some basic results concerning strict, nonstrict inequalities, local existence theorem and differential inequalities  have been proved for an IVP of first order hybrid  random differential equations with the linear perturbation of second type. A comparison theorem is proved and  applied to prove the uniqueness of random solution for the considered perturbed random differential eq...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002